Information on data:

The following data is on New Orleans tornado building damage during December 2022. This data was obtained from Verisk Analytics and it was derived computer vision and machine learning using post-catastrophe aerial imagry data. There are approximately 42,000 buildings in this dataset.


Before and after:

Here are some interactive before and after aerial images that were taken

This is an example of a building that has a catastrophe score of 100 (FEMA 6 / Destroyed)


This is another example of a building that has a catastrophe score of 100 (FEMA 6 / Destroyed)


This is an example of a building that has a catastrophe score of approximately 60 (FEMA 4 / Major)

Clean data:

I converted roof_solar into a T/F statement, by converting “SOLAR PANEL” to TRUE and “NO SOLAR PANEL” to FALSE. In addition to this, I converted the roof shapes that the computer wasn’t very sure about (up to a 20% chance of being incorrect) into NA. There were some cells in damage_level where they were filled with an empty character, so I converted that into NA as well. I then separated longitude and latitude so that it could be easily read into leaflet.

df <- read.csv("clean_data.csv") %>% 
  janitor::clean_names() %>% 
  mutate(roofsolar = case_when(roofsolar == "SOLAR PANEL" ~ TRUE)) %>%
  mutate(roofshape = ifelse(roofshascr < 0.80, NA, roofshape)) %>%
  select(-c(roofshascr, roofcondit_discolordetect, roofcondit_discolorscore, roofcondit_discolorpercen, trampscr, roofcondit_tarppercen))

df$rooftopgeo <- gsub("POINT \\(|\\)", "", df$rooftopgeo)

df <- df %>%
  separate(rooftopgeo, into = c("long", "lat"), sep = " ", convert = TRUE)

df$damage_level <- ifelse(df$damage_level == "", NA, df$damage_level)
df$roofshape <- factor(df$roofshape, levels = c("gable", "hip", "flat"))
levels_roofmateri <- c("metal", "shingle", "membrane", "shake", "tile")
df$roofmateri <- factor(df$roofmateri, levels = c("gravel", levels_roofmateri))
df$roofmateri <- factor(df$roofmateri, levels = levels_roofmateri)

Define damage categories:

Catastrophe scores are separated by the summary of the dataset, excluding the catastrophe scores of 0.

mostdamage <- df %>% filter(catastrophescore >= 50)
nodamage <- df %>% filter(catastrophescore == 0)
decimated <-df %>% filter(catastrophescore == 100)
middamage <- df %>% filter(catastrophescore < 50 & catastrophescore >= 15)
leastdamage <- df %>% filter(catastrophescore < 15 & catastrophescore >= 2)
minimaldamage <- df %>% filter(catastrophescore == 1)

Damage maps:

NOTE: Red indicates the buildings that were the most damaged (catastrophe score >= 50), orange indicates (25 < catastrophe score < 50), blue indicates (catastrophe score <= 25, excluding scores of 0). Only 3852 buildings experienced a nonzero catastrophe score, so the majority of the buildings (37,967) exhibited a catastrophe score of 0.

All points:

This shows all of the catastrophe scores, the vast majority of roofs have no damage.


Least damage:

Map of the buildings that experienced the least damage:

Mid damage:

Map of the buildings that experienced mid damage:

Most damage:

Map of the buildings that experienced the most damage (interactive!):

No damage:

Map of the buildings that experienced no damage:

Destroyed:

Map of the buildings that were completely destroyed:


Models:

Since most of the buildings in this dataset were not damaged by a tornado, the summary of the catastrophe scores of each building is skewed. This can be seen below:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   0.000   2.217   0.000 100.000

Check models: Extra

Due to this, I made models that excluded the catastrophe scores of 0 to just look into the structures that experienced damage. Below is the summary for the structures that exhibited damage:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2.00    4.00   15.00   28.64   46.00  100.00

Models

## # Comparison of Model Performance Indices
## 
## Name  | Model |   AIC (weights) |  AICc (weights) |   BIC (weights) |    R2 |   RMSE |  Sigma
## ---------------------------------------------------------------------------------------------
## mods1 |   glm | 26272.9 (<.001) | 26272.9 (<.001) | 26314.3 (<.001) | 0.095 | 28.657 | 28.689
## mods2 |   glm | 26272.9 (<.001) | 26272.9 (<.001) | 26314.3 (<.001) | 0.095 | 28.657 | 28.689
## mods3 |   glm | 26094.1 (>.999) | 26094.1 (>.999) | 26147.3 (>.999) | 0.147 | 27.816 | 27.857
## mods4 |   glm | 30925.4 (<.001) | 30925.5 (<.001) | 30968.0 (<.001) | 0.156 | 28.795 | 28.821
## mods5 |   glm | 30773.9 (<.001) | 30773.9 (<.001) | 30828.6 (<.001) | 0.176 | 28.402 | 28.437

Out of the models I made, Model 3 appeared to work best. Though it should be noted that none of these models fit particularly well based on the variables used.

Model 3

## 
## Call:
## glm(formula = catastrophescore ~ long + roofmateri + rooftree + 
##     enclosure, family = gaussian(link = "identity"), data = extra)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -75.62  -18.17   -8.69   13.16   82.23  
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        4460.13497 1149.62714   3.880 0.000107 ***
## long                 49.03688   12.76374   3.842 0.000124 ***
## roofmaterishingle   -23.29005    1.43088 -16.277  < 2e-16 ***
## roofmaterimembrane   12.42319    2.18262   5.692 1.37e-08 ***
## roofmaterishake     -21.90493    4.73851  -4.623 3.94e-06 ***
## roofmateritile      -22.84290    7.71403  -2.961 0.003087 ** 
## rooftree              0.56774    0.06876   8.257  < 2e-16 ***
## enclosureTRUE        44.80193   10.78228   4.155 3.34e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 808.6766)
## 
##     Null deviance: 3157376  on 3226  degrees of freedom
## Residual deviance: 2603130  on 3219  degrees of freedom
##   (10 observations deleted due to missingness)
## AIC: 30774
## 
## Number of Fisher Scoring iterations: 2
##                GVIF Df GVIF^(1/(2*Df))
## long       1.010340  1        1.005157
## roofmateri 1.020779  4        1.002574
## rooftree   1.012753  1        1.006356
## enclosure  1.004155  1        1.002076

Root mean squared error for Model 3

## [1] 27.41181

Predictions:

Based on Model 3, I have made model predictions:

Here is a comparison of the predicted vs the actual catastrophe score:

I then plotted the predicted catastrophe scores alongside the actual catastrophe scores for reference.


Interpretations:

The variables included in this dataset were shown to not be entirely helpful in predicting catastrophe scores accurately, which is exemplified in the graph above. More information would need to be considered, specifically, taking a look into tornadoes.


Graphs: